Fix managed memory misclassified as kDLCUDAHost in DLPack device mapping#1863
Conversation
…vice mapping _smv_get_dl_device() treated all buffers that are both device- and host-accessible as kDLCUDAHost. Managed (unified) memory is also both- accessible, so it was misclassified. CCCL's make_tma_descriptor then rejected the descriptor with "Device type must be kDLCUDA or kDLCUDAManaged". Preserve the is_managed flag already queried via CU_POINTER_ATTRIBUTE_IS_MANAGED in _query_memory_attrs(), expose it on Buffer, and use it in _smv_get_dl_device() to return kDLCUDAManaged for managed memory. Fixes: https://nvbugspro.nvidia.com/bug/6044342 Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
This comment has been minimized.
This comment has been minimized.
cpcloud
left a comment
There was a problem hiding this comment.
Thanks for working on this. I don't think the current patch closes the managed-buffer tensor-map path yet.
StridedMemoryView.from_any_interface(buffer) on a Buffer still goes through Buffer.__dlpack_device__() and _dlpack.setup_dl_tensor_device(), and both of those still map managed memory to kDLCUDAHost. Because _smv_get_dl_device() prefers view.dl_tensor.device when a DLPack capsule is present, the new branch here never corrects that case.
Please update the buffer-side DLPack export to emit kDLCUDAManaged as well, and keep the classification logic in one place so the Buffer and StridedMemoryView paths stay aligned. The existing managed-buffer tensor-map test already exercises this via _as_view(managed_buf), so that should become the regression guard once the buffer-side export is fixed too.
Update setup_dl_tensor_device() and Buffer.__dlpack_device__() to emit kDLCUDAManaged for managed memory, closing the gap where the Buffer -> DLPack capsule -> StridedMemoryView path still misclassified managed buffers as kDLCUDAHost. Add cross-reference comments to keep the three classification sites aligned. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Extract the duplicated device-type mapping logic from Buffer.__dlpack_device__(), setup_dl_tensor_device(), and _smv_get_dl_device() into a single classify_dl_device() function in _dlpack.pyx. All three call sites now delegate to it. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- Fix test_buffer_dunder_dlpack_device_success to expect kDLCUDAManaged for unified memory instead of the old buggy kDLCUDAHost. - Fix test_buffer_dlpack_failure_clean_up error message to match the unified classify_dl_device error. - Add test_managed_buffer_dlpack_roundtrip_device_type to cover the Buffer -> DLPack capsule -> StridedMemoryView end-to-end path. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
|
@cpcloud please re-review |
cpcloud
left a comment
There was a problem hiding this comment.
The buffer-side DLPack export paths are fixed now, and centralizing the device classification removes the drift risk I was worried about. The new managed-buffer roundtrip test also covers the end-to-end Buffer -> DLPack -> StridedMemoryView path, so this looks good to me.
|
Summary
_smv_get_dl_device()classified all device+host-accessible buffers askDLCUDAHost, including managed (unified) memory which should bekDLCUDAManagedmake_tma_descriptorrejectskDLCUDAHostwith "Device type must be kDLCUDA or kDLCUDAManaged", breaking TMA descriptor creation from managed buffersis_managedflag (already queried viaCU_POINTER_ATTRIBUTE_IS_MANAGEDin_query_memory_attrs()) in_MemAttrs, expose it onBuffer, and use it in_smv_get_dl_device()to returnkDLCUDAManagedFixes: https://nvbugspro.nvidia.com/bug/6044342
🤖 Generated with Claude Code